Novel Prefix Tri-Literal Word Analyser: Rule-Based Approach

نویسندگان

Mohammed M. Abu Shquier

Khaled M. Alhawiti

چکیده

Corresponding Author: Mohammed M. Abu Shquier Department of Information Science, University of Tabuk, Tabuk, KSA Email: [email protected] Abstract: Arabic stemming is a technique to find the stem or lexical root for Arabic words through the process of eliminating affixes (preffixes, infixes and suffixes) attached to their roots. Several approaches have been implemented to generate the stem of Arabic words according to a certain level of analysis, i.e., root-based approach, stem-based approach and statistical approach. Arabic language is a Semitic language which means that it is a derivational rather than a concatinative language. In this study we designed and implemented an Arabic triliteral Morphological Analyser that is capable of analysing the classical and Modern Standard Arabic (MSA) effectively with the capability of analysing vowelised, semivowelised and nonvowelised text. The system is integratable with other applications so that vast number of people can get benfited from. One shortcomming for the developed system is that the output obtained from the morphological analyser may contain several alternative solutions which leads to extraction ambiguity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Rule based Stemming Method for Multilingual Urdu Text

Urdu is a national language of Pakistan and spoken more than 200 million people use it as a verbal and written communication. There exists a large amount of unstructured Urdu textual data in the world; by applying data mining techniques useful information can be achieved. However it seriously lacks processing capabilities to develop innovative systems based on Urdu language. In this paper, auth...

متن کامل

Why Catalan-Spanish Neural Machine Translation? Analysis, comparison and combination with standard Rule and Phrase-based technologies

Catalan and Spanish are two related languages given that both derive from Latin. They share similarities in several linguistic levels including morphology, syntax and semantics. This makes them particularly interesting for the MT task. Given the recent appearance and popularity of neural MT, this paper analyzes the performance of this new approach compared to the well-established rule-based and...

متن کامل

Pseudo-Identities and Bordered Words

This paper investigates the notions of θ-bordered words and θ-unbordered words for various pseudo-identity functions θ. A θ-bordered word is a non-empty word u such that there exists a word v which is a prefix of u while θ(v) is a suffix of u. The case where θ is the identity function corresponds to the classical notions of bordered and unbordered words. Here we explore cases where θ is a pseud...

متن کامل

Containing overgeneration in Zulu computational morphology1

The development of a large-coverage, computational morphological analyser for Zulu requires the modelling not only of the regular phenomena often associated with word formation, but also the idiosyncratic behaviour that may occur in Zulu morphology. This paper discusses the application of an existing rule-based, finite-state morphological analyser prototype ZulMorph in semi-automating the minin...

متن کامل

A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Non-literal Use of Multiword Expressions

We present a graph-based model for representing the lexical cohesion of a discourse. In the graph structure, vertices correspond to the content words of a text and edges connecting pairs of words encode how closely the words are related semantically. We show that such a structure can be used to distinguish literal and non-literal usages of multi-word expressions.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 11 شماره

صفحات -

تاریخ انتشار 2015

Novel Prefix Tri-Literal Word Analyser: Rule-Based Approach

نویسندگان

چکیده

منابع مشابه

A Rule based Stemming Method for Multilingual Urdu Text

Why Catalan-Spanish Neural Machine Translation? Analysis, comparison and combination with standard Rule and Phrase-based technologies

Pseudo-Identities and Bordered Words

Containing overgeneration in Zulu computational morphology1

A Cohesion Graph Based Approach for Unsupervised Recognition of Literal and Non-literal Use of Multiword Expressions

عنوان ژورنال:

اشتراک گذاری